NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems

Chen, Lingjiao; Davis, Jared; Hanin, Boris; Bailis, Peter; Zaharia, Matei; Stoica, Ion; Zou, Jason (September 2024, Advances in neural information processing systems)

Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses. However, there is little understanding of how the number of LM calls -- e.g., when asking the LM to answer each question multiple times and taking a majority vote -- affects such a compound system's performance. In this paper, we initiate the study of scaling properties of compound inference systems. We analyze, theoretically and empirically, how the number of LM calls affects the performance of Vote and Filter-Vote, two of the simplest compound system designs, which aggregate LM responses via majority voting, optionally applying LM filters. We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls. Our theoretical results suggest that this non-monotonicity is due to the diversity of query difficulties within a task: more LM calls lead to higher performance on "easy" queries, but lower performance on "hard" queries, and non-monotone behavior can emerge when a task contains both types of queries. This insight then allows us to compute, from a small number of samples, the number of LM calls that maximizes system performance, and define an analytical scaling model for both systems. Experiments show that our scaling model can accurately predict the performance of Vote and Filter-Vote systems and thus find the optimal number of LM calls to make.
more » « less
Full Text Available
Exploiting Proximity Search and Easy Examples to Select Rare Events

Kang, Daniel; Derhacobian, Alex; Tsuji, Kaoru; Hebert, Trevor; Bailis, Peter; Fukami, Tadashi; Hashimoto, Tatsunori; Sun, Yi; Zaharia, Matei (December 2021, NeurIPS Data-Centric AI Workshop 2021)

A common problem practitioners face is to select rare events in a large dataset. Unfortunately, standard techniques ranging from pre-trained models to active learning do not leverage proximity structure present in many datasets and can lead to worse-than-random results. To address this, we propose EZMODE, an algorithm for iterative selection of rare events in large, unlabeled datasets. EZMODE leverages active learning to iteratively train classifiers, but chooses the easiest positive examples to label in contrast to standard uncertainty techniques. EZMODE also leverages proximity structure (e.g., temporal sampling) to find difficult positive examples. We show that EZMODE can outperform baselines by up to 130× on a novel, real-world, 9,000 GB video dataset.
more » « less
Full Text Available
Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

Tai, Kai Sheng; Bailis, Peter; Valiant, Gregory (January 2021, International conference on machine learning (ICML))
null (Ed.)
Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm. Our code is publicly available from github.
more » « less
Full Text Available
Similarity Search for Efficient Active Learning and Search of Rare Concepts

Coleman Cody; Chou, Edward; Katz-Samuels, Julian; Culatana, Sean; Bailis, Peter; Berg, Alexander C.; Nowak, Robert; Sumbaly, Roshan; Zaharia, Matei; Yalniz, I. Zeki (January 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set instead of scanning over all of the unlabeled data. We evaluate several selection strategies in this setting on three large-scale computer vision datasets: ImageNet, OpenImages, and a de-identified and aggregated dataset of 10 billion publicly shared images provided by a large internet company. Our approach achieved similar mean average precision and recall as the traditional global approach while reducing the computational cost of selection by up to three orders of magnitude, enabling web-scale active learning.
more » « less
Full Text Available
Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

Kraft, Peter; Kang, Daniel; Narayanan, Deepak; Palkar, Shoumik; Bailis, Peter; and Zaharia, Matei. (April 2020, MLSys 2020)
null (Ed.)
Systems for ML inference are widely deployed today, but they typically optimize ML inference workloads using techniques designed for conventional data serving workloads and miss critical opportunities to leverage the statistical nature of ML. In this paper, we present WILLUMP, an optimizer for ML inference that introduces two statistically-motivated optimizations targeting ML applications whose performance bottleneck is feature computation. First, WILLUMP automatically cascades feature computation for classification queries: WILLUMP classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5× without statistically significant accuracy loss. Second, WILLUMP accurately approximates ML top-K queries, discarding low-scoring inputs with an automatically constructed approximate model and then ranking the remainder with a more powerful model, improving query performance by up to 10× with minimal accuracy loss. WILLUMP automatically tunes these optimizations’ parameters to maximize query performance while meeting an accuracy target. Moreover, WILLUMP complements these statistical optimizations with compiler optimizations to automatically generate fast inference code for ML applications. We show that WILLUMP improves the end-to-end performance of real-world ML inference pipelines curated from major data science competitions by up to 16× without statistically significant loss of accuracy.
more » « less
Full Text Available
Equivariant Transformer Networks

Tai, Kai Sheng; Bailis, Peter; Valiant, Gregory (January 2019, International Conference on Machine Learning (ICML))

How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network? We propose Equivariant Transformers (ETs), a family of differentiable image-to-image mappings that improve the robustness of models towards predefined continuous transformation groups. Through the use of specially-derived canonical coordinate systems, ETs incorporate functions that are equivariant by construction with respect to these transformations. We show empirically that ETs can be flexibly composed to improve model robustness towards more complicated transformation groups in several parameters. On a real-world image classification task, ETs improve the sample efficiency of ResNet classifiers, achieving relative improvements in error rate of up to 15% in the limited data regime while increasing model parameter count by less than 1%
more » « less
Full Text Available
Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

Sharan, Vatsal; Tai, Kai Sheng; Bailis, Peter; Valiant, Gregory (January 2019, International Conference on Machine Learning (ICML))

What learning algorithms can be run directly on compressively-sensed data? In this work, we consider the question of accurately and efficiently computing low-rank matrix or tensor factorizations given data compressed via random projections. We examine the approach of first performing factorization in the compressed domain, and then reconstructing the original high-dimensional factors from the recovered (compressed) factors. In both the matrix and tensor settings, we establish conditions under which this natural approach will provably recover the original factors. While it is well-known that random projections preserve a number of geometric properties of a dataset, our work can be viewed as showing that they can also preserve certain solutions of non-convex, NP- Hard problems like non-negative matrix factorization. We support these theoretical results with experiments on synthetic data and demonstrate the practical applicability of compressed factorization on real-world gene expression and EEG time series datasets.
more » « less
Full Text Available
Filter before you parse: faster analytics on raw data with sparser

https://doi.org/10.14778/3236187.3236207

Palkar, Shoumik; Abuzaid, Firas; Bailis, Peter; Zaharia, Matei (July 2018, Proceedings of the VLDB Endowment)

Full Text Available
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

https://doi.org/10.1145/3352020.3352024

Coleman, Cody; Zaharia, Matei; Kang, Daniel; Narayanan, Deepak; Nardi, Luigi; Zhao, Tian; Zhang, Jian; Bailis, Peter; Olukotun, Kunle; Ré, Chris (July 2019, ACM SIGOPS Operating Systems Review)

Full Text Available
Sketching Linear Classifiers over Data Streams

https://doi.org/10.1145/3183713.3196930

Tai, Kai Sheng; Sharan, Vatsal; Bailis, Peter; Valiant, Gregory (January 2018, SIGMOD)

We introduce a new sub-linear space sketch the "Weight-Median Sketch" for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that establishes recovery guarantees for batch and online learning, and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods, including count-based sketches and feature hashing.
more » « less
Full Text Available

« Prev Next »

Search for: All records